Homework 5 – UFO Sightings Visualization


🔗 Links

  • 📁 The Data
  • 🧠 The Analysis (Notebook)

Sample Rows from DataSet

datetime city state country shape duration (seconds) duration (hours/min) comments date posted latitude longitude
0 1949-10-10 20:30:00 san marcos tx us cylinder 2700.0 45 minutes This event took place in early fall around 194... 4/27/2004 29.883056 -97.941111
1 1949-10-10 21:00:00 lackland afb tx NaN light 7200.0 1-2 hrs 1949 Lackland AFB&#44 TX. Lights racing acros... 12/16/2005 29.384210 -98.581082
2 1955-10-10 17:00:00 chester (uk/england) NaN gb circle 20.0 20 seconds Green/Orange circular disc over Chester&#44 En... 1/21/2008 53.200000 -2.916667
3 1956-10-10 21:00:00 edna tx us circle 20.0 1/2 hour My older brother and twin sister were leaving ... 1/17/2004 28.978333 -96.645833
4 1960-10-10 20:00:00 kaneohe hi us light 900.0 15 minutes AS a Marine 1st Lt. flying an FJ4B fighter/att... 1/22/2004 21.418056 -157.803611

Visualization 1: UFO Sightings Over Time by Shape (Interactive)

Write-up for Plot 1

This visualization shows the number of reported UFO sightings in the United States over time, broken down by the shape of the sighting. The data is grouped by month and shape, and displayed as a line chart with each line representing a different shape category (e.g., circle, triangle, fireball). A dropdown menu is included as an interactive element, allowing the viewer to filter and explore sightings of specific shapes.

For encoding types, I used:

Temporal encoding on the x-axis for year_month, formatted as a time field (T) to display a continuous monthly timeline.

Quantitative encoding on the y-axis to represent the count of sightings.

Nominal encoding for the color, using the shape field to distinguish different UFO shapes with separate lines and a color legend.

Regarding the color design, I used Altair’s default categorical color palette to ensure each shape is clearly distinguishable. Since the shape is a nominal field with multiple categories, using color was the most intuitive way to differentiate between them. The choice supports visual clarity when comparing multiple lines in the same space.

On the data transformation side, I first converted the datetime column to a valid datetime format, then dropped rows with missing values in the shape or datetime columns. I then created a new year_month field by converting the datetime to monthly periods and grouped the data by year_month and shape, counting the number of sightings per group. This grouped data was then used to construct the line chart.

The interactivity is implemented using Altair’s selection_point with a dropdown menu bound to the shape field. This lets users explore trends for a specific UFO shape without overwhelming the chart with all lines at once. It enhances clarity and allows for focused comparison over time.

Visualization 2: UFO Sightings Over Time (Dropdown Filter by Shape)

Write-up for Plot 2

This visualization shows the trend of UFO sightings over time, grouped by shape and aggregated monthly. It leverages a dropdown menu to let users select a specific UFO shape and observe how its frequency changed over time. Unlike traditional legends, the dropdown keeps the view uncluttered and helps focus analysis on one shape at a time.

In terms of encoding, the chart uses:

Temporal encoding on the x-axis (year_month:T) to display a continuous time scale by month.

Quantitative encoding on the y-axis (count:Q) to show the number of sightings per shape.

Color encoding (color='shape:N') is used to visually differentiate shapes when multiple are visible, although interactivity usually keeps only one visible at a time.

I used Altair’s default color scheme for categorical variables, which is suitable for distinguishing nominal data like UFO shape types. This avoids visual ambiguity and makes shape comparisons clear when multiple lines are shown.

From a data transformation standpoint, I started by cleaning the dataset—converting the datetime column to a valid timestamp and dropping rows with missing shape or date values. I created a new year_month column (monthly aggregation), and then used a groupby operation on both year_month and shape to count the number of sightings for each shape per month. The grouped data was formatted to support Altair’s time-based plotting functions.

The interactivity here is implemented with a dropdown selector bound to the shape field, using Altair's selection_point. Setting empty='all' ensures that all shapes are shown by default, and filtering only occurs when the user selects a shape. This approach avoids overwhelming the user with too many overlapping lines and enhances clarity when analyzing individual trends.